fix: Fix overflow for vertex/edge insersion#8
Open
zhanglei1949 wants to merge 33 commits intomainfrom
Open
Conversation
There was a problem hiding this comment.
Pull request overview
This PR targets the 4096 insert failure (issue #7) by introducing proactive capacity growth for vertex/edge storage structures so that sequential CREATE statements can continue past the default reserved space.
Changes:
- Add
EnsureCapacityAPIs for vertex/edge storage and call them before inserts to avoid “reserved space exhausted” failures. - Introduce edge-table capacity tracking for unbundled edges and expose capacity/size helpers across storage layers.
- Add/extend Python binding tests for high-volume vertex/edge insertions; adjust some logging/error handling paths.
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/python_bind/tests/test_tp_service.py | Adds TP service stress tests for inserting many vertices/edges; also updates execute() call style to pass access_mode. |
| tools/python_bind/tests/test_db_query.py | Adds embedded/local test for inserting many edges and validating count. |
| src/utils/property/column.cc | Fixes missing return in TypedEmptyColumn<T>::get_view. |
| src/transaction/update_transaction.cc | Calls graph_.EnsureCapacity(...) before vertex/edge inserts. |
| src/storages/graph/vertex_table.cc | Adds VertexTable::EnsureCapacity helper. |
| src/storages/graph/property_graph.cc | Adds PropertyGraph::EnsureCapacity for vertices/edges and improves vertex add error message. |
| src/storages/graph/graph_interface.cc | Calls EnsureCapacity in AP update interface before vertex/edge inserts. |
| src/storages/graph/edge_table.cc | Tracks/resizes unbundled edge property-table capacity and adds EdgeTable::EnsureCapacity. |
| src/storages/csr/mutable_csr.cc | Adds CSR capacity() API implementation. |
| src/storages/csr/immutable_csr.cc | Adds CSR capacity() API implementation. |
| src/server/neug_db_session.cc | Adds exception handling around transaction commit and returns structured internal errors. |
| src/compiler/planner/gopt_planner.cc | Downgrades compilePlan query logging from INFO to VLOG(1). |
| include/neug/storages/graph/vertex_table.h | Declares EnsureCapacity and adds Size() accessor. |
| include/neug/storages/graph/property_graph.h | Declares EnsureCapacity APIs (vertex and edge). |
| include/neug/storages/graph/edge_table.h | Declares EnsureCapacity, adds size/capacity helpers and capacity tracking member. |
| include/neug/storages/csr/csr_base.h | Adds pure virtual capacity() API for CSRs. |
| include/neug/storages/csr/mutable_csr.h | Declares capacity() overrides for mutable CSRs. |
| include/neug/storages/csr/immutable_csr.h | Declares capacity() overrides for immutable CSRs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
3fee21b to
12a19e3
Compare
Collaborator
Author
|
@greptile |
db68d39 to
ac3638f
Compare
Collaborator
Author
|
@greptile |
Committed-by: xiaolei.zl from Dev container Committed-by: xiaolei.zl from Dev container add capacity api Committed-by: xiaolei.zl from Dev container fix CI Committed-by: xiaolei.zl from Dev container remove tests for TP Committed-by: xiaolei.zl from Dev container Committed-by: xiaolei.zl from Dev container fixing Committed-by: xiaolei.zl from Dev container Committed-by: xiaolei.zl from Dev container Committed-by: xiaolei.zl from Dev container fix Update src/storages/graph/edge_table.cc Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> fixes fix use explicit capacity calculation for EnsureCapacity
3244938 to
a2f627c
Compare
Collaborator
Author
|
@greptile |
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Collaborator
Author
|
@greptile |
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Collaborator
Author
|
@greptile |
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Louyk14
pushed a commit
that referenced
this pull request
Mar 12, 2026
- Add `.clang-format`. - Remove constraint of singleton `GraphDB`.
Committed-by: xiaolei.zl from Dev container
…acOS (#29) * Generate -I for some third-party libs (e.g., protobuf, arrow) * fix
* enable codecov Committed-by: nengli.ln from Dev container Committed-by: nengli.ln from Dev container * add ignore for coverage Committed-by: nengli.ln from Dev container Committed-by: nengli.ln from Dev container
* support varchar(max_length) in NeuG type system Committed-by: Xiaoli Zhou from Dev container * skip specific regex when comparing physical plans Committed-by: Xiaoli Zhou from Dev container * minor fix according to review Committed-by: Xiaoli Zhou from Dev container * fix ci Committed-by: xiaolei.zl from Dev container Committed-by: xiaolei.zl from Dev container * avoid change func api Committed-by: xiaolei.zl from Dev container * ci strange failure Committed-by: xiaolei.zl from Dev container --------- Co-authored-by: xiaolei.zl <xiaolei.zl@alibaba-inc.com>
* add test for inserting string in embedding mode Committed-by: xiaolei.zl from Dev container * format Committed-by: xiaolei.zl from Dev container * Update tools/python_bind/tests/test_db_query.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Update tools/python_bind/tests/test_db_query.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Update tools/python_bind/tests/test_db_query.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Update tools/python_bind/tests/test_db_query.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * format Committed-by: xiaolei.zl from Dev container * fix Committed-by: xiaolei.zl from Dev container --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Collaborator
Author
|
@greptile |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix #7
Related Greptile view at https://github.com/GraphScope/neug/pull/1725
Greptile Summary
This PR fixes overflow bugs for vertex and edge insertions by introducing a centralized
calculate_new_capacitygrowth policy (growth.h), explicitEnsureCapacitymethods onVertexTable,EdgeTable, andPropertyGraph, and a statistics file (capacity + size) persisted at checkpoint time so the allocated capacity survives a dump/reopen cycle. It also removes the previous pattern of unconditionally resizing tables on every insertion and replaces it with amortized pre-allocation.Key changes:
growth.h: overflow-safe growth helpers (25% for vertex tables, ~20% for edge tables), replacing ad-hoc doublings scattered across the code.EdgeTable: addedcapacity_atomic,EnsureCapacity,Size, andCapacity; statistics file written to checkpoint and validated on open.VertexTable/PropertyGraph:EnsureCapacityoverloads replace the previous hard-codedReserve(Capacity * 2)calls.id_indexer.h: removed automatic 25% over-allocation onopen(), capacity management now entirely caller-driven.mmap_array<string_view>: compaction no longer shrinks the in-memory buffer (enabling in-place appends);ftruncateinstream_compact_and_dumppreserves the full buffer size in the output file;pos_is now persisted as a separate.posfile rather than recomputed fromdata_size().column.cc: fixed missingreturninTypedEmptyColumn::get_view.Issues found:
update_transaction.ccAddEdge: the WAL entry is serialized andop_num_incremented before theEnsureCapacitycheck, unlikeAddVertexwhich was corrected to check first. A failingEnsureCapacityleaves a dangling WAL entry.column.hinit_pos: when the.posfile is absent,pos_is initialized to0instead ofbuffer_.data_size(), causing new string inserts to overwrite existing data in any column opened from a checkpoint that predates the.posfile.mmap_array.havg_size(): iterates all items on every call; used inside the resize path inTypedColumn<string_view>::resize, making capacity growth O(n) in the number of existing strings.Confidence Score: 3/5
Important Files Changed
returnstatement so it no longer falls off the end of a non-void function.Sequence Diagram
sequenceDiagram participant Caller participant UpdateTransaction participant PropertyGraph participant VertexTable participant EdgeTable Note over Caller,EdgeTable: AddVertex flow (corrected in this PR) Caller->>UpdateTransaction: AddVertex(label, oid, props) UpdateTransaction->>PropertyGraph: EnsureCapacity(label) PropertyGraph->>VertexTable: EnsureCapacity(capacity) VertexTable->>VertexTable: Reserve(capacity) VertexTable-->>PropertyGraph: new_capacity PropertyGraph-->>UpdateTransaction: Status::OK UpdateTransaction->>UpdateTransaction: InsertVertexRedo::Serialize + op_num_++ UpdateTransaction->>PropertyGraph: AddVertex(...) Note over Caller,EdgeTable: AddEdge flow (ordering issue remains) Caller->>UpdateTransaction: AddEdge(src, dst, edge_label, props) UpdateTransaction->>UpdateTransaction: InsertEdgeRedo::Serialize + op_num_++ ⚠️ UpdateTransaction->>PropertyGraph: EnsureCapacity(src, dst, edge_label) PropertyGraph->>EdgeTable: EnsureCapacity(capacity) EdgeTable->>EdgeTable: table_->resize(capacity) EdgeTable-->>PropertyGraph: ok PropertyGraph-->>UpdateTransaction: Status::OK UpdateTransaction->>PropertyGraph: AddEdge(...) Note over Caller,EdgeTable: BatchAddVertices flow Caller->>PropertyGraph: BatchAddVertices(v_label, supplier) PropertyGraph->>VertexTable: insert_vertices(supplier) loop per batch VertexTable->>VertexTable: EnsureCapacity(calculate_new_capacity(new_size)) VertexTable->>VertexTable: insert_primary_keys + set_properties endComments Outside Diff (6)
src/transaction/update_transaction.cc, line 738-746 (link)Redo log serialized before capacity check — inconsistent transaction state on failure
InsertEdgeRedo::Serializeandop_num_ += 1are executed on lines 738–741 beforeEnsureCapacityis checked on line 742. IfEnsureCapacityreturns an error (e.g., the edge label triplet doesn't exist), the function returnsfalsebut the redo log (arc_) already contains a serialized edge entry andop_num_has been incremented. This leaves the transaction's redo log in an inconsistent state and could cause incorrect replay during crash recovery.This is directly inconsistent with the pattern used in the
AddVertexfunction in this same file (lines 683–691), which correctly callsEnsureCapacitybefore callingInsertVertexRedo::Serializeand incrementingop_num_.The fix is to move the
EnsureCapacitycall (and early return on failure) before the serialize/increment:src/transaction/update_transaction.cc, line 738-746 (link)Redo log serialized before capacity check — transaction log corruption on failure
InsertEdgeRedo::Serialize(line 738) andop_num_ += 1(line 741) execute beforeEnsureCapacityis checked. IfEnsureCapacityreturns an error and the function returnsfalse, the redo log entry has already been appended andop_num_incremented, but the edge was never actually inserted. This leaves the transaction log in an inconsistent state and could cause incorrect replay during recovery.This is directly inconsistent with the fixed
AddVertexpath in the same file (lines 683–692), where serialization is correctly placed after the capacity check:The
AddEdgeblock should follow the same pattern:src/transaction/update_transaction.cc, line 738-746 (link)Redo log serialized before capacity check
InsertEdgeRedo::Serialize(line 738) andop_num_ += 1(line 741) are called beforeEnsureCapacity(line 742). IfEnsureCapacityfails and the function returnsfalse, the WAL already contains the edge-insert entry andop_num_has been incremented — but the edge was never actually written to the graph.On WAL replay after a crash at this point, the replayed edge insertion would fail again (or worse, succeed spuriously with uninitialised capacity), corrupting the graph state.
The
AddVertexpath (line 683) does this correctly: it callsEnsureCapacityfirst, and only then serializes the redo entry.AddEdgeshould follow the same ordering:src/storages/graph/edge_table.cc, line 263-264 (link)reservereserves 0 instead of the intended sizeedge_datais freshly default-constructed, soedge_data.size()is0. The calledge_data.reserve(edge_data.size())is a no-op and the intent was clearly to pre-allocateproperty_vec.size()slots to avoid repeated reallocations during the loop below.include/neug/utils/mmap_array.h, line 365-380 (link)avg_size()is O(n) and is called in a resize hot pathavg_size()iterates over every item in the buffer to compute the average string length. This is called fromTypedColumn<std::string_view>::resize, which is invoked frominsert_vertices_implwhenever the vertex table grows. For tables with millions of string-typed properties, this becomes an O(n) scan inside what is essentially the capacity-growth logic, adding significant overhead that scales with the number of existing entries.Consider tracking the running total length incrementally (e.g., as an atomic member updated on each
setcall) rather than recomputing it on every call:Alternatively, cap the sample to the first N entries (e.g., 1024) to bound the cost.
include/neug/utils/property/column.h, line 522-527 (link)pos_initialized to 0 when.posfile is absent — corrupts strings on first post-dump insertWhen the
.posfile does not exist (e.g., first open from a freshly-created or migrated database before anyDump),pos_is set to0. If the column was opened from an existing checkpoint that already contains string data (the.itemsand.datafiles were copied from elsewhere), new string inserts will start writing at offset0, silently overwriting all previously stored string values.The pre-existing behavior in all four
open*paths was:which safely positions the write cursor at the end of existing data. The new
init_posno longer does this as a fallback.When the
.posfile does not exist, falling back tobuffer_.data_size()(the actual used extent) rather than0would be the safe default:Last reviewed commit: 5d6db2c